# COVID-19 Second Generation Surveillance System (COVIDSGSS) Dataset
## 1. Summary
The information below is retrieved from the Health Data Gateway API developed by NHS England, with additional fields added by UK LLC (indicated by italics). 

In [5]:
# define target dataset to document
schema = 'nhsd'
table = 'COVIDSGSS'
version = 'v0003'
# import functions from script helper
import sys
script_fp = "../../../../scripts/"
sys.path.insert(0, script_fp)
from data_doc_helper import DocHelper
# create instance
document = DocHelper(schema, table, version, script_fp)
# markdown/code hybrid cell module requirement
from IPython.display import display, Markdown

In [6]:
# get api data
dataset = document.get_api_data()
display(Markdown("**NHS England title of dataset:** "+dataset['datasetfields']['metadataquality']['title']))
display(Markdown("***Dataset name in UK LLC TRE:*** *nhsd.COVIDSGSS*"))  
display(Markdown("**Short abstract:** "+dataset['datasetfields']['abstract']))
display(Markdown("***Extended abstract:*** *UK Health Security Agency's (UKHSA) Second Generation Surveillance System (SGSS) is used to capture routine laboratory surveillance data on infectious diseases from diagnostic laboratories across England. Diagnostic laboratories are required to notify the UKHSA when specified causative agents are found in a human sample. The COVIDSGSS data reflect swab testing offered to those in hospital and NHS key workers (i.e. Pillar 1) and the wider community at drive through test centres, walk in centres, home kits returned by post, care homes, etc. (i.e. Pillar 2).*"))
display(Markdown("**Geographical coverage:** "+dataset['datasetfields']['geographicCoverage'][0]))
display(Markdown("**Temporal coverage:** "+dataset['datasetfields']['datasetStartDate']))
display(Markdown("***Data available in UK LLC TRE from:*** *06/04/2020 onwards*"))
display(Markdown("**Typical age range:** "+dataset['datasetfields']['ageBand']))
display(Markdown("**Collection situation:** "+dataset['datasetv2']['provenance']['origin']['collectionSituation'][0]))
display(Markdown("**Purpose:** "+dataset['datasetv2']['provenance']['origin']['purpose'][0]))
display(Markdown("**Source:** "+dataset['datasetv2']['provenance']['origin']['source'][0]))
display(Markdown("**Pathway:** "+dataset['datasetv2']['coverage']['pathway']))
display(Markdown("***Information collected:*** *Demographic information about people who test positive for SARS-CoV-2. For a full list of variables see the [NHS England Metadata dashboard.](https://digital.nhs.uk/services/data-access-request-service-dars/dars-products-and-services/metadata-dashboard)*"))  
display(Markdown("***Structure of dataset:*** *Each line represents one participant.*"))  
display(Markdown("***Update frequency in UK LLC TRE:*** *Quarterly*"))  
display(Markdown("***Dataset versions in UK LLC TRE:*** *TBC*"))
display(Markdown("***Data quality issues:*** *TBC*"))  
display(Markdown("***Restrictions to data usage***: *Research must be related to COVID-19 and be for medical purposes only (medical research) as defined in the NHS Act 2006: [https://www.legislation.gov.uk/ukpga/2006/41/part/13/crossheading/patient-information](https://www.legislation.gov.uk/ukpga/2006/41/part/13/crossheading/patient-information)*"))  
display(Markdown("***Further information:*** *[https://digital.nhs.uk/about-nhs-digital/corporate-information-and-documents/directions-and-data-provision-notices/data-provision-notices-dpns/sgss-and-sari-watch-data](https://digital.nhs.uk/about-nhs-digital/corporate-information-and-documents/directions-and-data-provision-notices/data-provision-notices-dpns/sgss-and-sari-watch-data)*"))

**NHS England title of dataset:** Covid-19 Second Generation Surveillance System

***Dataset name in UK LLC TRE:*** *nhsd.COVIDSGSS*

**Short abstract:** Data forming the Covid-19 Second Generation Surveillance Systems data set relate to demographic and diagnostic information from Pillar 1 swab testing in PHE labs and NHS hospitals and Pillar 2 Swab testing in the community.

***Extended abstract:*** *UK Health Security Agency's (UKHSA) Second Generation Surveillance System (SGSS) is used to capture routine laboratory surveillance data on infectious diseases from diagnostic laboratories across England. Diagnostic laboratories are required to notify the UKHSA when specified causative agents are found in a human sample. The COVIDSGSS data reflect swab testing offered to those in hospital and NHS key workers (i.e. Pillar 1) and the wider community at drive through test centres, walk in centres, home kits returned by post, care homes, etc. (i.e. Pillar 2).*

**Geographical coverage:** United Kingdom,England

**Temporal coverage:** 06/04/2020

***Data available in UK LLC TRE from:*** *06/04/2020 onwards*

**Typical age range:** 0-150

**Collection situation:** IN-PATIENTS

**Purpose:** CARE

**Source:** LIMS

**Pathway:** NOT APPLICABLE

***Information collected:*** *Demographic information about people who test positive for SARS-CoV-2. For a full list of variables see the [NHS England Metadata dashboard.](https://digital.nhs.uk/services/data-access-request-service-dars/dars-products-and-services/metadata-dashboard)*

***Structure of dataset:*** *Each line represents one participant.*

***Update frequency in UK LLC TRE:*** *Quarterly*

***Dataset versions in UK LLC TRE:*** *TBC*

***Data quality issues:*** *TBC*

***Restrictions to data usage***: *Research must be related to COVID-19 and be for medical purposes only (medical research) as defined in the NHS Act 2006: [https://www.legislation.gov.uk/ukpga/2006/41/part/13/crossheading/patient-information](https://www.legislation.gov.uk/ukpga/2006/41/part/13/crossheading/patient-information)*

***Further information:*** *[https://digital.nhs.uk/about-nhs-digital/corporate-information-and-documents/directions-and-data-provision-notices/data-provision-notices-dpns/sgss-and-sari-watch-data](https://digital.nhs.uk/about-nhs-digital/corporate-information-and-documents/directions-and-data-provision-notices/data-provision-notices-dpns/sgss-and-sari-watch-data)*

## 2. Metrics
The tables below summarise the COVIDSGSS dataset in the UK LLC TRE.

**Table 1** The number of participants from each LPS that are represented in the COVIDSGSS dataset in the UK LLC TRE  
(**Note**: numbers relate to the most recent extract of NHS England data)

In [7]:
# group extract by date
gb_cohort = document.get_cohort_count()
print(gb_cohort.to_markdown(index=False, tablefmt="fancy_grid"))
#display(gb_cohort)

╒════════════════╤═════════╕
│ cohort         │   count │
╞════════════════╪═════════╡
│ ALSPAC         │    2524 │
├────────────────┼─────────┤
│ BCS70          │    2201 │
├────────────────┼─────────┤
│ BIB            │    9398 │
├────────────────┼─────────┤
│ ELSA           │    1761 │
├────────────────┼─────────┤
│ EPICN          │    2405 │
├────────────────┼─────────┤
│ EXCEED         │    2946 │
├────────────────┼─────────┤
│ FENLAND        │    3135 │
├────────────────┼─────────┤
│ GLAD           │   31484 │
├────────────────┼─────────┤
│ MCS            │    6784 │
├────────────────┼─────────┤
│ NCDS58         │    1684 │
├────────────────┼─────────┤
│ NEXTSTEP       │    2070 │
├────────────────┼─────────┤
│ NIHRBIO_COPING │    7170 │
├────────────────┼─────────┤
│ NSHD46         │     404 │
├────────────────┼─────────┤
│ TEDS           │    3479 │
├────────────────┼─────────┤
│ TRACKC19       │    6363 │
├────────────────┼─────────┤
│ TWINSUK        │    4074 │
├─────────────

## 3. Helpful syntax
Below we will include syntax that may be helpful to other researchers in the UK LLC TRE. For longer scripts, we will include a snippet of the code plus a link to Git where you can find the full script. 